library(GSODR)
library(mosaic)
library(tidyverse)
library(pander)
library(DT)
library(ggrepel)
library(plotly)
library(dplyr)
library(ggplot2)
library(maps)
library(tmap)
library(leaflet)
library(htmltools)
library(car)
library(mosaicData)
library(ResourceSelection)
library(reshape2)
library(RColorBrewer)
library(scatterplot3d)
library(readr)
library(prettydoc)
library(knitr)
library(kableExtra)
library(formattable)
library(sf)
library(ggspatial)
library(leaflet.extras)
library(bslib)
library(shiny)
library(broom)
library(MASS)

testingcenter <- read_csv("C:/Users/paige/OneDrive/Documents/Fall Semester 2024/MATH 325/Statistics-Notebook-master/Data/Testingcenterscores.csv")

Background


In Fall Semester 2024, a new protocol required 100 level Math students to take their exams in Brigham Young University of Idaho’s Testing Center. This change came after faculty suspected cheating during exams, since in the previous semester (Spring 2024) students could take tests in quiet areas like dorm rooms or isolated campus spaces without proctoring. While preventing cheating is a valid reason for requiring Testing Center exams, is this new requirement severely affecting average test scores?


Click the tabs below to explore the data collection

Hide Data



Show Data

To investigate how Testing Center exams affect scores, I requested exam data from teachers of 100-level classes during Spring and Fall 2024. The test scores come from classes 100B (Beginning Algebra) and 101 (Intermediate Algebra).

To ensure a fair comparison between semesters, I collected scores from the midpoint of both terms. For each exam, I recorded the teacher (Baird, Ballou, Oldyroyd, or Ashcraft), whether it was taken in the Testing Center (In or Out), and the student’s score.

datatable(testingcenter, options=list(lengthMenu=c(3,10,30)))



Analysis


To analyze the effects of the Testing Center, we will conduct a Two-Way ANOVA test. This will show both the overall impact of Testing Center exams and how these changes affected each teacher’s students specifically. Below is how the two- way ANOVA test is expressed in a mathematical model:

\[\underbrace{Y_{ijk}}_\text{Test Scores} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijk} \]


Click the tab to see what each part of the equation means

Hide Equation Key



Show Equation Key


Part What does this mean?
\(\mu\)

The grand mean/ average (which is the average Y-value, aka. test scores, ignoring all information contained in the factors)
\(\alpha_i\)

The first factor, Teachers, with levels being the specific teachers:
- Brother Baird (BB)
- Sister Ballou (SB)
- Sister Oldyroyd (O)
- Sister Ashcraft (A)
\(\beta_i\)

The second factor, Testing Center, with levels being tests taken:
- IN the testing center
- OUT of the testing center
\(\alpha\beta_{ij}\)

The interaction of the two factors, which has 8 levels
- 4 teachers x 2 testing locations = 8



Structuring our Questions


Based on the mathematical model above, we can formally state our hypotheses about the exam scores. The table below outlines each hypothesis and what it aims to determine.

Hypothesis What does this mean?
\(H_0 : \alpha_{BB} = \alpha_{SB} = \alpha_O = \alpha_A = 0\)

\(H_a : \alpha_i \neq 0 \text{ for at least one i} ∈ \text{{1 = BB, 2 = SB, 3 = O, 4 = A}}\)

1. Does the math teacher a student takes effect average test scores?
\(H_0 : \beta_{In} = \beta_{Out} = 0\)

\(H_a : \beta_j \neq 0 \text{ for at least one i} ∈ \text{ {1 = In, 2= Out}}\)

2. Does the use of the testing center impact average exam scores?
\(H_0 : \alpha\beta_{ij} = 0 \text{ for all i,j}\)

\(H_a : \alpha\beta_{ij} \neq 0 \text{ at least one i,j}\)

3. Does the effect of the testing center vary by teacher? (Alternatively, does the teacher influence how students perform in the testing center?)

We are also going to use what is called a level of significance to compare what are called probability values (p-values) to throughout the study in order to see which factors are significant to our data or not. Our level of significance will be:

\[ \alpha = 0.05 \]

With those parameters in place, we can proceed with our test.


Two-Way ANOVA Test


The table below shows the results of the Two-Way ANOVA test on the 100 level math class exam scores.

The only column we care about is the p.value, as these will tell us if our factors are significant. If the p-value is colored blue (or less than 0.05), this means that factor is significant. If the p-value is colored red (or greater than 0.05), this means that factor is insignificant.

According to our results, the specific professor and testing center significantly impact exam scores, while there is no evidence that the testing center impacted the effect of each teacher’s students differently. Therefore, it ultimately comes down to the teacher and whether or not the exam was taken in the testing center.

TCanova <- aov(`Test Scores` ~ Teacher + `Testing Center` + Teacher:`Testing Center`, data = testingcenter)

tcova <- tidy(TCanova)

tcova %>% 
  mutate(
    `p.value` = ifelse(
      `p.value` < 0.0001, 
      format(`p.value`, scientific = TRUE, digits = 5),  
      round(`p.value`, 5)
    ),
    `p.value` = cell_spec(
      `p.value`, "html", 
      color = ifelse(
        is.na(`p.value`), 
        "black", 
        ifelse(as.numeric(`p.value`) < 0.05, "dodgerblue", "red")
      )
    )
  ) %>%
  kbl(escape = FALSE, col.names = c("Term", "df", "sumsq", "meansq", "statistic", "p.value")) %>%
  kable_styling("striped", full_width = TRUE)
Term df sumsq meansq statistic p.value
Teacher 3 4528.91303 1509.63768 6.0716385 0.00041
Testing Center 1 16893.49642 16893.49642 67.9442520 2.7242e-16
Teacher:Testing Center 3 86.79945 28.93315 0.1163667 0.95054
Residuals 2418 601205.74097 248.63761 NA NA


The next tab checks the normality of the data and if we can really trust the results of our ANOVA test.



Diagnostic Plots


Before diving deeper into each factor, we must first verify if our Two-Way ANOVA test results are reliable by checking whether our collected data meets the ANOVA test requirements.


The two ANOVA test reqirements are as follows:

  1. Constant (Equal) Variance
  • checking that each point is independent from each other

  • What we want: random scattering of points

  1. Normal Error Terms (Residuals)
  • checking that the distribution of the data is normal

  • What we want: all points following the dashed line

par(mfrow=c(1,2))

plot(TCanova, which=1:2, pch=16)

The magnitude of the vertical variability of these dots indicates that the data does not meet the Constant Variance requirement. This is due to the inconsistent spread of each section of dots, particularly the two clusters on the far right having different lengths compared to the others.

Since the points deviate from the line of normality at both ends, we can conclude that the data does not meet the Normal Terms requirement. Due to our deviation of dots, this indicates that our data is highly skewed and not at all close to normal. This deviation likely occurs because of outliers in the lower test scores.


While our data fails to meet both requirements, this means we should interpret our results with caution as we cannot promise that they are definitive. We will carefully analyze our ANOVA test results, graphical summaries, and numerical summaries with these limitations in mind.





How has taking exams in the testing center affected each teacher’s students?


As a reminder, this was our p-value for this first factor:

\[\text{Teacher p-value} = 0.0004095 < \alpha\]


Since the p-value is less than our level of significance, teachers have a significant effect on student exam scores. This aligns with teachers’ fundamental role in guiding students toward academic success.

The graph and table below compares scores and averages across professors’ classes, regardless of if the student took their exam in the testing center. The data shows that students in Brother Baird’s 100-level classes achieve higher average scores of around 87.96, whether taking exams in the testing center or not. Sister Ballou’s students show the second-highest performance, followed by Sister Ashcraft’s and Sister Oldyroyd’s students.

*Note: Sister Ashcraft’s scores appear as more separated dots because her scores are recorded as whole numbers, unlike her colleagues who use decimal scores.



Hover over the dots to see individuals scores as well as the average score between each professor.

testingcenter <- testingcenter %>%
  mutate(tooltip = paste("Professor:", Teacher, "<br>Exam Score:", `Test Scores`))

testingcenter$Teacher <- factor(testingcenter$Teacher, levels = c("Baird","Ballou","Oldyroyd", "Ashcraft"))

meanie <- testingcenter %>%
  group_by(Teacher) %>%
  summarise(Average = mean(`Test Scores`))

meanie <- meanie %>%
  mutate(tooltip = paste("Professor:", Teacher, "<br><b>Average Exam Score:</b>", round(Average, 2)))

Mathy <- ggplot(testingcenter, aes(x=Teacher, y=`Test Scores`, color= Teacher, text = tooltip)) +
  geom_point(size = 2) +
  geom_line(data = meanie, aes(x = Teacher, y = Average, group = 1), 
            color = "lightcoral", size = 0.5, inherit.aes = FALSE) +
  geom_point(data = meanie, aes(x = Teacher, y = Average, text=tooltip), 
             color = "lightcoral", size = 2, inherit.aes = FALSE)   +
  scale_color_manual(values = c("red1", "red2", "red3", "darkred")) +
  labs(title="BYU-Idaho Math 100 Professors' Student Exam Performance", x="Professors", y="Exam Scores") + 
  theme_minimal()

ggplotly(Mathy, tooltip = "text")
testingcenter %>%
  group_by(Teacher) %>%
  summarise(`Average Test Scores`=mean(`Test Scores`), .groups="drop") %>%
  pander(caption="Average Test Scores by Teacher")
Average Test Scores by Teacher
Teacher Average Test Scores
Baird 87.96
Ballou 87.94
Oldyroyd 85.06
Ashcraft 85.28




Does the use of the testing center impact average exam scores?


As a reminder, this was our p-value for this second factor:

\[\text{Testing Center p-value} = 2.724e^{-16} < \alpha\]


The extremely low p-value indicates a significant relationship between testing center exams and student performance.

The data shows that average test scores decreased when students took exams in the testing center. Specifically, there is a 5.38% drop in average scores, shifting student grades from B+ to B or B- range. While this may seem like a small difference, it can substantially impact a student’s overall grade, especially if they are struggling in other areas.



Hover over the dots to see individual scores and average scores. Additionally click the Box Plot Style tab to hover and see the five number summary of each group.

Scatter Plot Style

testingcenter <- testingcenter %>%
  mutate(tooltip = paste("Testing Center:", `Testing Center`, "<br>Exam Score:", `Test Scores`))

testingcenter$`Testing Center` <- factor(testingcenter$`Testing Center`, levels = c("Out", "In"))

meanit <- testingcenter %>%
  group_by(`Testing Center`) %>%
  summarise(Average = mean(`Test Scores`))

meanit <- meanit %>%
  mutate(tooltip = paste("Testing Center:", `Testing Center`, "<br><b>Average Exam Score:</b>", round(Average, 2)))

Mathi <- ggplot(testingcenter, aes(x=`Testing Center`, y=`Test Scores`, color = `Testing Center`, text= tooltip)) +
  geom_point(size = 2, alpha = 0.8) +
  geom_line(data = meanit, aes(x = `Testing Center`, y = Average, group = 1), 
            color = "turquoise", size = 0.5, inherit.aes = FALSE) +
  geom_point(data = meanit, aes(x = `Testing Center`, y = Average, text=tooltip), 
             color = "turquoise", size = 2, inherit.aes = FALSE)   +
  scale_color_manual(values = c("midnightblue", "dodgerblue")) +
  labs(title="BYU-Idaho Math 100 Student Performance: Testing Center vs. Other Locations", x="Outside or inside the testing center?", y="Exam Scores") +
  theme_minimal()

ggplotly(Mathi, tooltip = "text")

Box Plot Style

Mathii <- ggplot(testingcenter, aes(x=`Testing Center`, y=`Test Scores`, fill = `Testing Center`)) +
  geom_boxplot(aes(color = `Testing Center`),alpha =0.5, size = 1) +
  scale_fill_manual(values = c("midnightblue", "dodgerblue")) +
  scale_color_manual(values = c("midnightblue","dodgerblue")) +
    geom_line(data = meanit, aes(x = `Testing Center`, y = Average, group = 1), 
            color = "turquoise", size = 0.5, inherit.aes = FALSE) +
  geom_point(data = meanit, aes(x = `Testing Center`, y = Average), 
             color = "turquoise", size = 2, inherit.aes = FALSE)  +
  labs(title="BYU-Idaho Math 100 Student Performance: Testing Center vs. Other Locations", x="Outside or inside the testing center?", y="Exam Scores") +
  theme_minimal()

ggplotly(Mathii)

testingcenter %>%
  group_by(`Testing Center`) %>%
  summarise(`Average Test Scores`=mean(`Test Scores`), .groups="drop") %>%
  pander(caption="Average Exam Scores by Testing Location")
Average Exam Scores by Testing Location
Testing Center Average Test Scores
Out 89.37
In 83.99




How has taking exams in the testing center affected each teacher’s students?


As a reminder, this was our p-value for this third factor:

\[\text{Interaction p-value} = 0.9505 > \alpha\]

Since our p-value was greater than our level of significance, there is no significant relationship between specific teachers’ students and their performance in the testing center. This means we found no evidence that any teacher’s students perform notably better or worse in the testing center compared to others. This suggests that teaching methods don’t significantly influence how students perform in different testing environments.

However, when examining each professor’s data separately, we found that their students who took exams in the testing center performed worse compared to those who took exams elsewhere. Sister Oldyroyd’s and Sister Ashcraft’s classes showed modest changes, with averages dropping from B+ to B-, while Brother Bair’s and Sister Ballou’s classes experienced larger drops with a whole letter change from A- to B.

Overall, there was approximately a 6.11% decrease in scores when exams were taken in the testing center. This trend is even more visible in the Split tab, where scores outside the testing center cluster toward the higher end, while scores inside the testing center show a broader distribution skewing toward lower scores.


Hover over the dots to see individual scores and average scores. Additionally click between the Combined and Split tabs to further compare and contrast each group.

Combined

create_graph <- function(data, split = FALSE) {
  
  data <- data %>%
    mutate(
      text = paste(
        "Teacher:", Teacher, "<br>",
        "Testing Center:", `Testing Center`, "<br>",
        "Exam Score:", `Test Scores`
      )
    )
  
  
  mean_data <- data %>%
    group_by(Teacher, `Testing Center`) %>%
    summarise(mean_score = mean(`Test Scores`), .groups = "drop") %>%
    mutate(text = paste(
      "Teacher:", Teacher, "<br>",
      "Testing Center:", `Testing Center`, "<br>",
      "<b>Average Exam Score:</b>", round(mean_score, 2), "<br>"))
  
  if (split) {
    data <- data %>%
      mutate(x_axis = interaction(Teacher, `Testing Center`))
    mean_data <- mean_data %>%
      mutate(x_axis = interaction(Teacher, `Testing Center`))
    
     plot <- ggplot(data, aes(x = interaction(Teacher, `Testing Center`), y = `Test Scores`, color = `Testing Center`)) +
      geom_point(position = position_dodge(width = 0.8)) +
      stat_summary(fun = "mean", geom = "line", aes(group = `Testing Center`), position = position_dodge(width = 0.8)) +
      labs(
        title = "BYU-Idaho 100 Level Math Exam Scores by Testing Location",
        x = "Professors and Testing Center",
        y = "Test Scores"
      ) +
      theme_minimal() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
  } else {
    plot <- ggplot(data, aes(x = Teacher, y = `Test Scores`, group = `Testing Center`, color = `Testing Center`, text = text)) +
      geom_point(size = 2) +
      stat_summary(fun = "mean", geom = "line") +
      geom_point(data = mean_data, aes(x = Teacher, y = mean_score, color = `Testing Center`, text = text), 
                 inherit.aes = FALSE, size = 3) +
      labs(
        title = "BYU- Idaho 100 Level Math Exam Scores",
        x = "BYU-Idaho Professors",
        y = "Test Scores"
      ) +
      theme_minimal()
  }
  
  ggplotly(plot, tooltip = "text") # Add tooltip for custom text
}


create_graph(testingcenter, split = FALSE)

Split

create_graph(testingcenter, split = TRUE)

testingcenter %>%
  group_by(Teacher,`Testing Center`) %>%
  summarise(ave=mean(`Test Scores`), .groups="drop") %>%
  spread(Teacher, ave) %>%
  pander(caption="Average Exam Scores by Teacher and Testing Location")
Average Exam Scores by Teacher and Testing Location
Testing Center Baird Ballou Oldyroyd Ashcraft
Out 91.34 90.61 88.21 88.48
In 85.47 85.86 82.65 83.21



Conclusion


With all things considered, the testing center does affect student exam performance, as does the specific professor.

Based on the 6.11% decrease in average scores, the primary recommendation is to discontinue the use of the testing center, as this decline could persist and worsen in future semesters. However, this decrease in average scores could also be interpreted as the testing center scores (85%–82%) might actually represent student performance more accurately than exams taken elsewhere (91%–88%).

If the testing center remains in use— given that professors significantly influence exam performance —administrators could encourage faculty collaboration on teaching strategies or revise parts of the curriculum to address the performance drop in testing center exams. This raises questions about whether tutoring now has a stronger impact on test scores than before, and whether more students are seeking retakes due to testing center requirements.

The effectiveness of any implemented changes will depend on how thoughtfully we respond to these challenges. As educators, our job is to educate, but this extends beyond exclusively teaching students. While sharing our work and insights with colleagues and examining our own practices can make us feel vulnerable, we shouldn’t shy away from using this feedback. Instead, we should apply these lessons to improve both our students’ learning and our own teaching practices to the best of our abilities.

Finally, it is important to note that the data collected failed to meet both requirements of the ANOVA test we conducted, so these findings must be viewed with caution. Additional studies would be necessary to statistically validate these results.



Sources


  • Data Collection

    • Thank you to the equally amazing professors from BYU-Idaho, Brother Baird, Sister Ballou, Sister Oldyroyd, and Sister Ashcraft, for assisting me in my data collection. My Statistics professor and I commend you for your courage to be vulnerable and the spectacular work they do here at this university!



  • Chat GPT

    • For helping me out when I encountered errors or wanted to do something funky outside my skill set